Defining and predicting structurally conserved regions in protein superfamilies

نویسندگان

  • Ivan K. Huang
  • Jimin Pei
  • Nick V. Grishin
چکیده

MOTIVATION The structures of homologous proteins are generally better conserved than their sequences. This phenomenon is demonstrated by the prevalence of structurally conserved regions (SCRs) even in highly divergent protein families. Defining SCRs requires the comparison of two or more homologous structures and is affected by their availability and divergence, and our ability to deduce structurally equivalent positions among them. In the absence of multiple homologous structures, it is necessary to predict SCRs of a protein using information from only a set of homologous sequences and (if available) a single structure. Accurate SCR predictions can benefit homology modelling and sequence alignment. RESULTS Using pairwise DaliLite alignments among a set of homologous structures, we devised a simple measure of structural conservation, termed structural conservation index (SCI). SCI was used to distinguish SCRs from non-SCRs. A database of SCRs was compiled from 386 SCOP superfamilies containing 6489 protein domains. Artificial neural networks were then trained to predict SCRs with various features deduced from a single structure and homologous sequences. Assessment of the predictions via a 5-fold cross-validation method revealed that predictions based on features derived from a single structure perform similarly to ones based on homologous sequences, while combining sequence and structural features was optimal in terms of accuracy (0.755) and Matthews correlation coefficient (0.476). These results suggest that even without information from multiple structures, it is still possible to effectively predict SCRs for a protein. Finally, inspection of the structures with the worst predictions pinpoints difficulties in SCR definitions. AVAILABILITY The SCR database and the prediction server can be found at http://prodata.swmed.edu/SCR. CONTACT [email protected] or [email protected] SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics Online.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

In silico Analysis of Pasteurella multocida PlpE Protein Epitopes As Novel Subunit Vaccine Candidates

Background: Pasteurella multocida is a Gram-negative, non-motile, non-spore forming, and aerobic/anaerobic cocobacillus known as the causative agent of human and animal diseases. Humans can often be affected by cat scratch or bite, which may lead to soft tissue infections and in rare cases to bacteremia and septicemia. Commercial vaccines against this agent include inactivated, live attenuated,...

متن کامل

PASS2: A Database of Structure-Based Sequence Alignments of Protein Structural Domain Superfamilies

Sequence alignments guided by structural features are particularly suited for distant relationships and they permit a better sampling of the protein sequence space. Reliable sequence alignments could be useful in evolutionary biology and in defining structurefunction relationships for protein superfamilies. PASS2 database presents structure-based alignments of protein domains related at the sup...

متن کامل

SMoS: a database of structural motifs of protein superfamilies.

The Structural Motifs of Superfamilies (SMoS) database provides information about the structural motifs of aligned protein domain superfamilies. Such motifs among structurally aligned multiple members of protein superfamilies are recognized by the conservation of amino acid preference and solvent inaccessibility and are examined for the conservation of other features like secondary structural c...

متن کامل

The CATH Hierarchy Revisited—Structural Divergence in Domain Superfamilies and the Continuity of Fold Space

This paper explores the structural continuum in CATH and the extent to which superfamilies adopt distinct folds. Although most superfamilies are structurally conserved, in some of the most highly populated superfamilies (4% of all superfamilies) there is considerable structural divergence. While relatives share a similar fold in the evolutionary conserved core, diverse elaborations to this core...

متن کامل

Diversity in protein domain superfamilies

Whilst ∼93% of domain superfamilies appear to be relatively structurally and functionally conserved based on the available data from the CATH-Gene3D domain classification resource, the remainder are much more diverse. In this review, we consider how domains in some of the most ubiquitous and promiscuous superfamilies have evolved, in particular the plasticity in their functional sites and surfa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 29 2  شماره 

صفحات  -

تاریخ انتشار 2013